AUC Optimization vs. Error Rate Minimization

نویسندگان

Corinna Cortes

Mehryar Mohri

چکیده

The area under an ROC curve (AUC) is a criterion used in many applications to measure the quality of a classification algorithm. However, the objective function optimized in most of these algorithms is the error rate and not the AUC value. We give a detailed statistical analysis of the relationship between the AUC and the error rate, including the first exact expression of the expected value and the variance of the AUC for a fixed error rate. Our results show that the average AUC is monotonically increasing as a function of the classification accuracy, but that the standard deviation for uneven distributions and higher error rates is noticeable. Thus, algorithms designed to minimize the error rate may not lead to the best possible AUC values. We show that, under certain conditions, the global function optimized by the RankBoost algorithm is exactly the AUC. We report the results of our experiments with RankBoost in several datasets demonstrating the benefits of an algorithm specifically designed to globally optimize the AUC over other existing algorithms optimizing an approximation of the AUC or only locally optimizing the AUC. 1 Motivation In many applications, the overall classification error rate is not the most pertinent performance measure, criteria such as ordering or ranking seem more appropriate. Consider for example the list of relevant documents returned by a search engine for a specific query. That list may contain several thousand documents, but, in practice, only the top fifty or so are examined by the user. Thus, a search engine’s ranking of the documents is more critical than the accuracy of its classification of all documents as relevant or not. More generally, for a binary classifier assigning a real-valued score to each object, a better correlation between output scores and the probability of correct classification is highly desirable. A natural criterion or summary statistic often used to measure the ranking quality of a classifier is the area under an ROC curve (AUC) [8].1 However, the objective function optimized by most classification algorithms is the error rate and not the AUC. Recently, several algorithms have been proposed for maximizing the AUC value locally [4] or maximizing some approximations of the global AUC value [9, 15], but, in general, these algorithms do not obtain AUC values significantly better than those obtained by an algorithm designed to minimize the error rates. Thus, it is important to determine the relationship between the AUC values and the error rate. ∗This author’s new address is: Google Labs, 1440 Broadway, New York, NY 10018, [email protected]. The AUC value is equivalent to the Wilcoxon-Mann-Whitney statistic [8] and closely related to the Gini index [1]. It has been re-invented under the name of L-measure by [11], as already pointed out by [2], and slightly modified under the name of Linear Ranking by [13, 14].

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Directly and Efficiently Optimizing Prediction Error and AUC of Linear Classifiers

The predictive quality of machine learning models is typically measured in terms of their (approximate) expected prediction error or the socalled Area Under the Curve (AUC) for a particular data distribution. However, when the models are constructed by the means of empirical risk minimization, surrogate functions such as the logistic loss are optimized instead. This is done because the empirica...

متن کامل

Efficient AUC Optimization for Information Ranking Applications

Adequate evaluation of an information retrieval system to estimate future performance is a crucial task. Area under the ROC curve (AUC) is widely used to evaluate the generalization of a retrieval system. However, the objective function optimized in many retrieval systems is the error rate and not the AUC value. This paper provides an efficient and effective non-linear approach to optimize AUC ...

متن کامل

Context-dependent memory decay is evidence of effort minimization in motor learning: a computational study

Recent theoretical models suggest that motor learning includes at least two processes: error minimization and memory decay. While learning a novel movement, a motor memory of the movement is gradually formed to minimize the movement error between the desired and actual movements in each training trial, but the memory is slightly forgotten in each trial. The learning effects of error minimizatio...

متن کامل

Maximization of AUC and Buffered AUC in Classification

This paper utilizes a new concept, called Buffered Probability of Exceedance (bPOE), to introduce an alternative to the Area Under the Receiver Operating Characteristic Curve (AUC) performance metric called Buffered AUC (bAUC). Central to the creation of bAUC is a new technique for calculation and optimization of bPOE. We show this formula to be easily integrable into optimization frameworks, o...

متن کامل

Rate-Distortion Optimal Shape Coding Using B-Spline Snakes

This paper addresses the problem of joint rate-distortion optimization of spline curves for optimal video object shape coding. It proposes a B-spline snake model which minimizes rate-distortion energy to find optimal spline coefficients. A continous model of a contour distortion is used which models the shape reconstruction error. A continous model of a coding bitrate is based on the relative i...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

AUC Optimization vs. Error Rate Minimization

نویسندگان

چکیده

منابع مشابه

Directly and Efficiently Optimizing Prediction Error and AUC of Linear Classifiers

Efficient AUC Optimization for Information Ranking Applications

Context-dependent memory decay is evidence of effort minimization in motor learning: a computational study

Maximization of AUC and Buffered AUC in Classification

Rate-Distortion Optimal Shape Coding Using B-Spline Snakes

عنوان ژورنال:

اشتراک گذاری